Search CORE

227 research outputs found

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification

Author: Guo Yike
Lertvittayakumjorn Piyawat
Zhang Jingqing
Publication venue
Publication date: 01/01/2019
Field of study

Insufficient or even unavailable training data of emerging classes is a big challenge of many classification tasks, including text classification. Recognising text documents of classes that have never been seen in the learning stage, so-called zero-shot text classification, is therefore difficult and only limited previous works tackled this problem. In this paper, we propose a two-phase framework together with data augmentation and feature augmentation to solve this problem. Four kinds of semantic knowledge (word embeddings, class descriptions, class hierarchy, and a general knowledge graph) are incorporated into the proposed framework to deal with instances of unseen classes effectively. Experimental results show that each and the combination of the two phases achieve the best overall accuracy compared with baselines and recent approaches in classifying real-world texts under the zero-shot scenario.Comment: Accepted NAACL-HLT 201

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Vector-based Efficient Data Hiding in Encrypted Images via Multi-MSB Replacement

Author: Luo Wenbin
Zhang Yike
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/02/2023
Field of study

As an essential technique for data privacy protection, reversible data hiding in encrypted images (RDHEI) methods have drawn intensive research interest in recent years. In response to the increasing demand for protecting data privacy, novel methods that perform RDHEI are continually being developed. We propose two effective multi-MSB (most significant bit) replacement-based approaches that yield comparably high data embedding capacity, improve overall processing speed, and enhance reconstructed images' quality. Our first method, Efficient Multi-MSB Replacement-RDHEI (EMR-RDHEI), obtains higher data embedding rates (DERs, also known as payloads) and better visual quality in reconstructed images when compared with many other state-of-the-art methods. Our second method, Lossless Multi-MSB Replacement-RDHEI (LMR-RDHEI), can losslessly recover original images after an information embedding process is performed. To verify the accuracy of our methods, we compared them with other recent RDHEI techniques and performed extensive experiments using the widely accepted BOWS-2 dataset. Our experimental results showed that the DER of our EMR-RDHEI method ranged from 1.2087 bit per pixel (bpp) to 6.2682 bpp with an average of 3.2457 bpp. For the LMR-RDHEI method, the average DER was 2.5325 bpp, with a range between 0.2129 bpp and 6.0168 bpp. Our results demonstrate that these methods outperform many other state-of-the-art RDHEI algorithms. Additionally, the multi-MSB replacement-based approach provides a clean design and efficient vectorized implementation.Comment: 14 pages; journa

arXiv.org e-Print Archive

Self-supervised Registration and Segmentation of the Ossicles with A Single Ground Truth Label

Author: Noble Jack
Zhang Yike
Publication venue
Publication date: 15/02/2023
Field of study

AI-assisted surgeries have drawn the attention of the medical image research community due to their real-world impact on improving surgery success rates. For image-guided surgeries, such as Cochlear Implants (CIs), accurate object segmentation can provide useful information for surgeons before an operation. Recently published image segmentation methods that leverage machine learning usually rely on a large number of manually predefined ground truth labels. However, it is a laborious and time-consuming task to prepare the dataset. This paper presents a novel technique using a self-supervised 3D-UNet that produces a dense deformation field between an atlas and a target image that can be used for atlas-based segmentation of the ossicles. Our results show that our method outperforms traditional image segmentation methods and generates a more accurate boundary around the ossicles based on Dice similarity coefficient and point-to-point error comparison. The mean Dice coefficient is improved by 8.51% with our proposed method.Comment: conferenc

arXiv.org e-Print Archive

OmiEmbed: a unified multi-task deep learning framework for multi-omics data

Author: Guo Yike
Sun Kai
Xing Yuting
Zhang Xiaoyu
Publication venue: 'MDPI AG'
Publication date: 18/05/2021
Field of study

High-dimensional omics data contains intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data due to the large number of molecular features and small number of available samples, which is also called 'the curse of dimensionality' in machine learning. To tackle this problem and pave the way for machine learning aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed support multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy comparing to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various application of high-dimensional omics data and has a great potential to facilitate more accurate and personalised clinical decision making.Comment: 14 pages, 8 figures, 7 table

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Internet Platform Enterprises and Farmers Digital Literacy Improvement

Author: Bao Wen
Zhang Yike
Publication venue: 'EDP Sciences'
Publication date: 01/01/2023
Field of study

Agricultural and rural informatization is the strategic commanding heights of agricultural and rural modernization, and the improvement of farmers' digital literacy is the core task of digital rural construction. In the digital age, the development of the platform economy has given Internet platform enterprises a new connotation of social responsibility to improve farmers' digital literacy. By strengthening platform governance to enhance farmers' willingness to use digital, promoting science and technology for good to enhance farmers' digital use skills, and innovating digital services to enhance farmers' digital use effects, Internet platform enterprises ultimately achieve the purpose of improving farmers' digital literacy

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

Author: Dai Chengliang
Guo Yike
Sun Kai
Yang Xian
Zhang Jingqing
Zhang Xiaoyu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Different aspects of a clinical sample can be revealed by multiple types of omics data. Integrated analysis of multi-omics data provides a comprehensive view of patients, which has the potential to facilitate more accurate clinical decision making. However, omics data are normally high dimensional with large number of molecular features and relatively small number of available samples with clinical labels. The "dimensionality curse" makes it challenging to train a machine learning model using high dimensional omics data like DNA methylation and gene expression profiles. Here we propose an end-to-end deep learning model called OmiVAE to extract low dimensional features and classify samples from multi-omics data. OmiVAE combines the basic structure of variational autoencoders with a classification network to achieve task-oriented feature extraction and multi-class classification. The training procedure of OmiVAE is comprised of an unsupervised phase without the classifier and a supervised phase with the classifier. During the unsupervised phase, a hierarchical cluster structure of samples can be automatically formed without the need for labels. And in the supervised phase, OmiVAE achieved an average classification accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and normal samples, which shows better performance than other existing methods. The OmiVAE model learned from multi-omics data outperformed that using only one type of omics data, which indicates that the complementary information from different omics datatypes provides useful insights for biomedical tasks like cancer classification.Comment: 7 pages, 4 figure

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository

Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records

Author: Dai Chengliang
Guo Yike
Sun Kai
Yang Xian
Zhang Jingqing
Zhang Xiaoyu
Publication venue
Publication date: 01/01/2019
Field of study

The extraction of phenotype information which is naturally contained in electronic health records (EHRs) has been found to be useful in various clinical informatics applications such as disease diagnosis. However, due to imprecise descriptions, lack of gold standards and the demand for efficiency, annotating phenotypic abnormalities on millions of EHR narratives is still challenging. In this work, we propose a novel unsupervised deep learning framework to annotate the phenotypic abnormalities from EHRs via semantic latent representations. The proposed framework takes the advantage of Human Phenotype Ontology (HPO), which is a knowledge base of phenotypic abnormalities, to standardize the annotation results. Experiments have been conducted on 52,722 EHRs from MIMIC-III dataset. Quantitative and qualitative analysis have shown the proposed framework achieves state-of-the-art annotation performance and computational efficiency compared with other methods.Comment: Accepted by BIBM 2019 (Regular

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository